Acute pain and chronic kidney disease (CKD) are two of the most serious and phenotypically variable complications in sickle cell disease (SCD). To bring more consistency to how these complications are quantified, we developed the Sickle Cell Organ Grading System (SCOGS), a clinician-designed 5-point scale assigning severity grades to 51 SCD-specific complications across major organ systems. Acute pain gradesrange from 1 to 5, starting with pain managed at home (grade 1) and ending with pain-related death (grade 5). Intermediate grades reflect increasing severity: outpatient care without admission (grade 2), inpatient admission without organ failure (grade 3), and admissions involving organ failure or life-threatening complications (grade 4). CKD grades are based on a combination of estimated glomerular filtration rate (eGFR) and albuminuria, with grade 1 indicating mild impairment and grade 5 defined as end-stage kidney disease or death due to CKD.

While SCOGS provides a useful system for complication-specific phenotyping, missing data often limits its use in clinical practice or when applied to existing registry data. In the Sickle Cell Clinical Research and Intervention Program (SCCRIP) cohort study, nearly all patients (98.8%) had enough information to assign a pain grade, but only 21% could be graded for CKD. To address this limitation, we asked whether machine learning (ML) could (1) reproduce clinician-assigned SCOGS grades using available clinical data beyond those used in SCOGS classification, and (2) impute missing grades by learning patterns in related variables.

We trained random forest classifiers using longitudinal SCCRIP data, including labs, medications, anthropometrics, and PedsQL SCD module quality of life scores. Acute pain and CKD models were developed separately. Model performance was evaluated using five-fold cross-validation, with predictive accuracy assessed by area under the receiver operating characteristic curve (AUC). Feature importance was analyzed using SHAP (SHapley Additive exPlanations) to interpret model behavior. Both models performed well, achieving an AUC of 0.97 and 90% accuracy for pain, and an AUC of 0.98 with 97% accuracy for CKD, showing that ML could replicate clinician grading with high accuracy. For acute pain, key predictors included hemoglobin, transfusion history, age, BMI, and red blood cell count. For CKD, key features included hemoglobin, creatinine, transfusion exposure, and treatment history. Some less conventional features also appeared, but we interpreted them as reflecting statistical correlation rather than causal clinical factors. Overall, the models relied on patterns that reflected known clinical reasoning, supporting the internal consistency of the SCOGS.

In addition, ML helped expand SCOGS grading to patients who would otherwise go ungraded. Using model-based imputation, we increased CKD grading coverage from 21% to over 90%. Although we couldn't validate predictions in patients from the existing registry due to absence of reference grades, cross-validation in the labeled dataset showed over 90% agreement between model-predicted and clinician-assigned grades based on confusion matrix analysis. This suggests the models can reliably infer SCOGS grades from routine clinical data, even when key variables are missing. However, the model is not yet ready for use in high-stakes clinical or research settings. We plan to conduct sensitivity analyses and test the model in independent cohorts to better understand its reliability and generalizability. The results will guide future improvements, including updates to the model's structure and the selection of input features.

In summary, ML can closely mirror expert decision-making in assigning disease severity and extend the reach of grading systems like SCOGS by addressing missing data. This approach makes structured phenotyping more broadly applicable across diverse patient populations and provides a foundation for scalable, individualized care for SCD in real-word settings.

This content is only available as a PDF.
Sign in via your Institution